Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase initialCapacity for HashSet in ExtractDependencies.scala #18219

Closed
wants to merge 2 commits into from
Closed

Increase initialCapacity for HashSet in ExtractDependencies.scala #18219

wants to merge 2 commits into from

Conversation

lolgab
Copy link
Contributor

@lolgab lolgab commented Jul 16, 2023

Growing that HashSet seems to take 1.27% of the allocations when compiling dotty itself.

Growing that HashSet seem to take 1.27% of the allocations when
compiling dotty itself.
@lolgab
Copy link
Contributor Author

lolgab commented Jul 16, 2023

test performance please

1 similar comment
@nicolasstucki
Copy link
Contributor

test performance please

@dottybot
Copy link
Member

performance test scheduled: 239 job(s) in queue, 1 running.

@@ -447,7 +447,7 @@ private class ExtractDependenciesCollector extends tpd.TreeTraverser { thisTreeT
// Avoid cycles by remembering both the types (testcase:
// tests/run/enum-values.scala) and the symbols of named types (testcase:
// tests/pos-java-interop/i13575) we've seen before.
val seen = new mutable.HashSet[Symbol | Type]
val seen = new mutable.HashSet[Symbol | Type](initialCapacity = 128, loadFactor = mutable.HashSet.defaultLoadFactor)
Copy link
Contributor

@nicolasstucki nicolasstucki Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 seems a bit too excessive. If we look at the histogram of maximum sizes we see that 64 is enough to cover most cases (99%). Even 32 we would cover 87% of cases and have an extra allocation for the 13% (larger cases).

Screenshot 2023-07-17 at 10 08 39

I measured it when running scala3-bootstrapped/compile after a clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasstucki Nice histogram! I wanted to build one too but gave up eventually. How did you do it?
Do your percentages consider the loadFactor of 75%? Should I update the initialCapacity to 64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  def addTypeDependency(tpe: Type)(using Context): Unit = {
    val traverser = new TypeDependencyTraverser {
      def addDependency(symbol: Symbol) = addMemberRefDependency(symbol)
    }
    traverser.traverse(tpe)
    println("seen: " + traverser.seen.size)
  }

  def addPatMatDependency(tpe: Type)(using Context): Unit = {
    val traverser = new TypeDependencyTraverser {
      def addDependency(symbol: Symbol) =
        if (!ignoreDependency(symbol) && symbol.is(Sealed)) {
          val usedName = symbol.zincMangledName
          addUsedName(usedName, UseScope.PatMatTarget)
        }
    }
    traverser.traverse(tpe)
    println("seen: " + traverser.seen.size)

  }

sbt "clean; scala3-bootstrapped/compile" > sbtOutput.txt then filter out seen: from that file and import the numbers in google sheets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not consider the load factor. My percentages only take into account the size of the set. I would go for 64 if we want to minimize those allocations.

64 is enough to cover almost 99% of the cases so it's a good
trade-off between memory consumed and allocations count.
@lolgab lolgab marked this pull request as ready for review July 17, 2023 09:04
@nicolasstucki
Copy link
Contributor

test performance please

@dottybot
Copy link
Member

performance test scheduled: 239 job(s) in queue, 1 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/18219/ to see the changes.

Benchmarks is based on merging with main (2753a17)

@nicolasstucki
Copy link
Contributor

There are 3 months of non-benchmarked commits #18221 (comment). The performance regression cound have happened before this change.

@mbovel
Copy link
Member

mbovel commented Jul 17, 2023

There are 3 months of non-benchmarked commits

These are being benchmarked. It should take less than two weeks. I suggest you wait to have the missing info to check if this PR is responsible for the regression or not.

@bishabosha
Copy link
Member

bishabosha commented Jul 19, 2023

Here is my comparison before and after of cpu time (on warmed up scala3-compiler-bootstrapped):

before:
Screenshot 2023-07-19 at 16 21 42

after:
Screenshot 2023-07-19 at 16 22 06

so now traversal takes 47% (cpu time) of the phase instead of 52% (more of a memory bottleneck is allocating Node in the hashset)

I also noticed on the side that recordDependency is doing a lot of time interacting with the filesystem since the Tasty classpath changes

@lolgab
Copy link
Contributor Author

lolgab commented Aug 12, 2023

Can you trigger another performance test now?

@bishabosha
Copy link
Member

Im also working on a more thorough optimisation for this phase, but sure lets do the bench

@bishabosha
Copy link
Member

test performance with #sbt please

@dottybot
Copy link
Member

performance test scheduled: 3 job(s) in queue, 1 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/18219/ to see the changes.

Benchmarks is based on merging with main (ca6a80e)

bishabosha added a commit that referenced this pull request Sep 22, 2023
alternative to #18219

reduces allocations in the phase by 77% (on
`scala3-compiler-bootstrapped`), allocations in the whole compilation
pipeline are reduced from about 12% to 4% contributed by this phase.

Reduced allocations come from:
- reusing a single set for the `seen` cache.
- use linear probe maps (dotty.tools.dotc.util) avoiding allocation of
nodes
- no `ClassDependency` allocation each time a dependency exists (many
are likely duplicated)

performance is also improved due to less expensive hashing
(`EqHashMap`), and introduction of `EqHashSet`, also reducing number of
times we hash (`add` method on HashSet, and optimised
`getOrElseUpdate`). Probably also from reduced heap pressure of
allocating.

Now the majority of allocation for this phase is by calling into Zinc
@bishabosha
Copy link
Member

closing now that #18403 is merged

@bishabosha bishabosha closed this Oct 11, 2023
@lolgab lolgab deleted the bigger-table-hash-set branch October 11, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants